Augmenting Phrase Table by Employing Lexicons for Pivot-based SMT

نویسندگان

  • Yiming Cui
  • Conghui Zhu
  • Xiaoning Zhu
  • Tiejun Zhao
چکیده

Pivot language is employed as a way to solve the data sparseness problem in machine translation, especially when the data for a particular language pair does not exist. The combination of source-to-pivot and pivot-to-target translation models can induce a new translation model through the pivot language. However, the errors in two models may compound as noise, and still, the combined model may suffer from a serious phrase sparsity problem. In this paper, we directly employ the word lexical model in IBM models as an additional resource to augment pivot phrase table. In addition, we also propose a phrase table pruning method which takes into account both of the source and target phrasal coverage. Experimental result shows that our pruning method significantly outperforms the conventional one, which only considers source side phrasal coverage. Furthermore, by including the entries in the lexicon model, the phrase coverage increased, and we achieved improved results in Chinese-to-Japanese translation using English as pivot language.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Building a Bilingual Lexicon Using Phrase-based Statistical Machine Translation via a Pivot Language

This paper proposes a novel method for building a bilingual lexicon through a pivot language by using phrase-based statistical machine translation (SMT). Given two bilingual lexicons between language pairs Lf–Lp and Lp–Le, we assume these lexicons as parallel corpora. Then, we merge the extracted two phrase tables into one phrase table between Lf and Le. Finally, we construct a phrase-based SMT...

متن کامل

Literature Survey: Study of Reordering in Pivot Based SMT

Pivot Based SMT solves the problem of scarcity of source-target parallel corpus by introducing a third resource rich ‘pivot’ language. Triangulation method in Pivot Based SMT is a method that uses the pivot language to induce new phrase pairs into the phrase table, this process is known as ‘Phrase Table Triangulation’. Phrase Table Triangulation has been extensively studied by many researchers....

متن کامل

A Comparison of Pivot Methods for Phrase-Based Statistical Machine Translation

We compare two pivot strategies for phrase-based statistical machine translation (SMT), namely phrase translation and sentence translation. The phrase translation strategy means that we directly construct a phrase translation table (phrase-table) of the source and target language pair from two phrase-tables; one constructed from the source language and English and one constructed from English a...

متن کامل

Triangulation of Reordering Tables: An Advancement Over Phrase Table Triangulation in Pivot-Based SMT

Triangulation in Pivot-Based Statistical Machine Translation(SMT) is a very effective method for building Machine Translation(MT) systems in case of scarcity of the parallel corpus. Phrase Table Triangulation helps in such a resource constrained setting by inducing new phrase pairs with the help of a pivot. However, it does not explore the possibility of extracting reordering information throug...

متن کامل

Language Independent Connectivity Strength Features for Phrase Pivot Statistical Machine Translation

An important challenge to statistical machine translation (SMT) is the lack of parallel data for many language pairs. One common solution is to pivot through a third language for which there exist parallel corpora with the source and target languages. Although pivoting is a robust technique, it introduces some low quality translations. In this paper, we present two language-independent features...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1512.00170  شماره 

صفحات  -

تاریخ انتشار 2015